-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Compile compatibility #789
Conversation
# Conflicts: # tensordict/persistent.py
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 39.2330μs | 19.6395μs | 50.9177 KOps/s | 56.3278 KOps/s | |
test_plain_set_stack_nested | 49.6630μs | 19.8642μs | 50.3419 KOps/s | 56.3207 KOps/s | |
test_plain_set_nested_inplace | 50.0140μs | 20.9289μs | 47.7808 KOps/s | 50.5279 KOps/s | |
test_plain_set_stack_nested_inplace | 55.3030μs | 20.6476μs | 48.4317 KOps/s | 50.2862 KOps/s | |
test_items | 23.6140μs | 2.6180μs | 381.9779 KOps/s | 378.3066 KOps/s | |
test_items_nested | 2.2167ms | 1.0062ms | 993.8427 Ops/s | 3.6951 KOps/s | |
test_items_nested_locked | 1.6765ms | 0.9923ms | 1.0078 KOps/s | 3.7577 KOps/s | |
test_items_nested_leaf | 0.1239ms | 78.7698μs | 12.6952 KOps/s | 12.8590 KOps/s | |
test_items_stack_nested | 1.6257ms | 1.0024ms | 997.5718 Ops/s | 3.6990 KOps/s | |
test_items_stack_nested_leaf | 0.1533ms | 79.0894μs | 12.6439 KOps/s | 13.2799 KOps/s | |
test_items_stack_nested_locked | 1.6770ms | 0.9988ms | 1.0013 KOps/s | 3.7057 KOps/s | |
test_keys | 6.9790μs | 1.0835μs | 922.9771 KOps/s | 254.6147 KOps/s | |
test_keys_nested | 4.2549ms | 0.1412ms | 7.0845 KOps/s | 7.3408 KOps/s | |
test_keys_nested_locked | 0.3100ms | 0.1464ms | 6.8307 KOps/s | 7.0739 KOps/s | |
test_keys_nested_leaf | 0.2269ms | 0.1189ms | 8.4070 KOps/s | 8.5202 KOps/s | |
test_keys_stack_nested | 0.2408ms | 0.1402ms | 7.1329 KOps/s | 7.3895 KOps/s | |
test_keys_stack_nested_leaf | 0.1960ms | 0.1200ms | 8.3363 KOps/s | 8.6799 KOps/s | |
test_keys_stack_nested_locked | 0.2194ms | 0.1467ms | 6.8181 KOps/s | 7.1884 KOps/s | |
test_values | 11.6842μs | 1.2144μs | 823.4496 KOps/s | 855.0884 KOps/s | |
test_values_nested | 78.9470μs | 39.8091μs | 25.1199 KOps/s | 20.1042 KOps/s | |
test_values_nested_locked | 77.6260μs | 39.3189μs | 25.4331 KOps/s | 19.9375 KOps/s | |
test_values_nested_leaf | 92.7340μs | 34.7116μs | 28.8088 KOps/s | 22.1588 KOps/s | |
test_values_stack_nested | 95.5390μs | 40.1319μs | 24.9178 KOps/s | 19.6181 KOps/s | |
test_values_stack_nested_leaf | 79.2780μs | 35.0474μs | 28.5328 KOps/s | 22.5054 KOps/s | |
test_values_stack_nested_locked | 83.9770μs | 40.1158μs | 24.9278 KOps/s | 19.5471 KOps/s | |
test_membership | 2.3449μs | 0.2661μs | 3.7577 MOps/s | 741.3191 KOps/s | |
test_membership_nested | 0.9633ms | 2.9507μs | 338.9033 KOps/s | 290.3204 KOps/s | |
test_membership_nested_leaf | 43.5420μs | 2.8804μs | 347.1769 KOps/s | 289.1807 KOps/s | |
test_membership_stacked_nested | 25.5580μs | 2.8966μs | 345.2292 KOps/s | 268.1196 KOps/s | |
test_membership_stacked_nested_leaf | 24.3150μs | 2.9047μs | 344.2723 KOps/s | 285.2066 KOps/s | |
test_membership_nested_last | 41.8690μs | 3.8195μs | 261.8136 KOps/s | 241.7426 KOps/s | |
test_membership_nested_leaf_last | 35.0260μs | 3.8317μs | 260.9830 KOps/s | 244.2073 KOps/s | |
test_membership_stacked_nested_last | 48.4910μs | 5.7575μs | 173.6876 KOps/s | 75.9033 KOps/s | |
test_membership_stacked_nested_leaf_last | 38.4020μs | 5.8479μs | 171.0014 KOps/s | 75.1052 KOps/s | |
test_nested_getleaf | 53.9110μs | 13.6463μs | 73.2797 KOps/s | 93.6077 KOps/s | |
test_nested_get | 41.9380μs | 13.0242μs | 76.7802 KOps/s | 100.3202 KOps/s | |
test_stacked_getleaf | 53.8900μs | 13.5081μs | 74.0296 KOps/s | 92.6363 KOps/s | |
test_stacked_get | 57.6070μs | 12.9361μs | 77.3029 KOps/s | 99.3819 KOps/s | |
test_nested_getitemleaf | 53.4500μs | 14.2245μs | 70.3014 KOps/s | 89.0389 KOps/s | |
test_nested_getitem | 57.0970μs | 13.0898μs | 76.3954 KOps/s | 97.8701 KOps/s | |
test_stacked_getitemleaf | 57.9880μs | 14.0932μs | 70.9564 KOps/s | 89.7029 KOps/s | |
test_stacked_getitem | 54.3010μs | 13.1206μs | 76.2162 KOps/s | 99.0638 KOps/s | |
test_lock_nested | 1.8996ms | 0.3640ms | 2.7469 KOps/s | 2.7825 KOps/s | |
test_lock_stack_nested | 0.4534ms | 0.3172ms | 3.1521 KOps/s | 3.3073 KOps/s | |
test_unlock_nested | 86.0342ms | 0.4548ms | 2.1987 KOps/s | 2.3400 KOps/s | |
test_unlock_stack_nested | 0.4527ms | 0.3249ms | 3.0775 KOps/s | 3.2287 KOps/s | |
test_flatten_speed | 0.2451ms | 95.4212μs | 10.4798 KOps/s | 10.4381 KOps/s | |
test_unflatten_speed | 0.7815ms | 0.4433ms | 2.2557 KOps/s | 2.4600 KOps/s | |
test_common_ops | 4.4417ms | 0.7048ms | 1.4188 KOps/s | 1.3322 KOps/s | |
test_creation | 87.4130μs | 1.8934μs | 528.1584 KOps/s | 512.9202 KOps/s | |
test_creation_empty | 29.2040μs | 9.6150μs | 104.0044 KOps/s | 84.0416 KOps/s | |
test_creation_nested_1 | 40.2550μs | 13.0867μs | 76.4136 KOps/s | 68.0673 KOps/s | |
test_creation_nested_2 | 49.7630μs | 15.8172μs | 63.2222 KOps/s | 55.1494 KOps/s | |
test_clone | 0.2580ms | 13.2261μs | 75.6082 KOps/s | 71.6745 KOps/s | |
test_getitem[int] | 35.5060μs | 11.4884μs | 87.0442 KOps/s | 86.5559 KOps/s | |
test_getitem[slice_int] | 77.7260μs | 24.2233μs | 41.2825 KOps/s | 43.7737 KOps/s | |
test_getitem[range] | 95.5080μs | 44.9774μs | 22.2334 KOps/s | 15.8685 KOps/s | |
test_getitem[tuple] | 65.8730μs | 19.0115μs | 52.5998 KOps/s | 51.9421 KOps/s | |
test_getitem[list] | 0.1162ms | 40.2100μs | 24.8694 KOps/s | 23.3721 KOps/s | |
test_setitem_dim[int] | 49.6630μs | 29.5403μs | 33.8521 KOps/s | 26.5246 KOps/s | |
test_setitem_dim[slice_int] | 96.1910μs | 57.4017μs | 17.4211 KOps/s | 15.2924 KOps/s | |
test_setitem_dim[range] | 0.1533ms | 79.0319μs | 12.6531 KOps/s | 11.2082 KOps/s | |
test_setitem_dim[tuple] | 91.2500μs | 45.5052μs | 21.9755 KOps/s | 18.6747 KOps/s | |
test_setitem | 0.2309ms | 18.8329μs | 53.0985 KOps/s | 47.1920 KOps/s | |
test_set | 0.2309ms | 18.5304μs | 53.9655 KOps/s | 48.2493 KOps/s | |
test_set_shared | 4.4468ms | 0.1459ms | 6.8529 KOps/s | 6.4714 KOps/s | |
test_update | 0.2465ms | 20.3976μs | 49.0254 KOps/s | 43.1188 KOps/s | |
test_update_nested | 0.2736ms | 27.9278μs | 35.8066 KOps/s | 32.0033 KOps/s | |
test_update__nested | 0.2124ms | 25.3946μs | 39.3784 KOps/s | 39.4136 KOps/s | |
test_set_nested | 0.2024ms | 21.7490μs | 45.9791 KOps/s | 43.9382 KOps/s | |
test_set_nested_new | 0.8492ms | 25.4490μs | 39.2943 KOps/s | 37.3543 KOps/s | |
test_select | 0.2296ms | 42.3831μs | 23.5943 KOps/s | 24.1775 KOps/s | |
test_select_nested | 0.2969ms | 61.6501μs | 16.2206 KOps/s | 16.3932 KOps/s | |
test_exclude_nested | 0.1454ms | 74.4531μs | 13.4313 KOps/s | 8.2447 KOps/s | |
test_empty[True] | 0.5112ms | 0.2834ms | 3.5280 KOps/s | 2.5247 KOps/s | |
test_empty[False] | 12.1603μs | 1.1071μs | 903.2525 KOps/s | 906.3382 KOps/s | |
test_unbind_speed | 0.3023ms | 0.2575ms | 3.8836 KOps/s | 3.8754 KOps/s | |
test_unbind_speed_stack0 | 0.3647ms | 0.2547ms | 3.9263 KOps/s | 3.9612 KOps/s | |
test_unbind_speed_stack1 | 88.8440ms | 0.7019ms | 1.4247 KOps/s | 1.3034 KOps/s | |
test_split | 86.2926ms | 1.6102ms | 621.0411 Ops/s | 609.1152 Ops/s | |
test_chunk | 87.6960ms | 1.6159ms | 618.8487 Ops/s | 659.3399 Ops/s | |
test_creation[device0] | 0.2307ms | 86.3018μs | 11.5872 KOps/s | 11.6497 KOps/s | |
test_creation_from_tensor | 0.2538ms | 86.8008μs | 11.5206 KOps/s | 11.3516 KOps/s | |
test_add_one[memmap_tensor0] | 0.7411ms | 5.4633μs | 183.0395 KOps/s | 185.5124 KOps/s | |
test_contiguous[memmap_tensor0] | 13.6950μs | 0.6326μs | 1.5808 MOps/s | 1.5894 MOps/s | |
test_stack[memmap_tensor0] | 55.8940μs | 3.5632μs | 280.6445 KOps/s | 278.8074 KOps/s | |
test_memmaptd_index | 1.1647ms | 0.2573ms | 3.8866 KOps/s | 3.8885 KOps/s | |
test_memmaptd_index_astensor | 1.1103ms | 0.3444ms | 2.9035 KOps/s | 2.5051 KOps/s | |
test_memmaptd_index_op | 1.2659ms | 0.6119ms | 1.6342 KOps/s | 1.5151 KOps/s | |
test_serialize_model | 0.2025s | 0.1213s | 8.2427 Ops/s | 8.1973 Ops/s | |
test_serialize_model_pickle | 0.4520s | 0.3817s | 2.6198 Ops/s | 2.6144 Ops/s | |
test_serialize_weights | 0.2014s | 0.1181s | 8.4690 Ops/s | 8.3725 Ops/s | |
test_serialize_weights_returnearly | 0.2139s | 0.1412s | 7.0819 Ops/s | 7.6530 Ops/s | |
test_serialize_weights_pickle | 1.2470s | 0.6096s | 1.6404 Ops/s | 2.4440 Ops/s | |
test_serialize_weights_filesystem | 0.1005s | 93.6594ms | 10.6770 Ops/s | 9.6574 Ops/s | |
test_serialize_model_filesystem | 0.1032s | 96.2173ms | 10.3931 Ops/s | 9.7874 Ops/s | |
test_reshape_pytree | 62.2370μs | 25.0278μs | 39.9556 KOps/s | 38.5395 KOps/s | |
test_reshape_td | 74.5400μs | 33.2656μs | 30.0611 KOps/s | 30.4101 KOps/s | |
test_view_pytree | 65.5820μs | 25.0151μs | 39.9758 KOps/s | 39.7065 KOps/s | |
test_view_td | 84.2970μs | 37.0670μs | 26.9782 KOps/s | 27.0001 KOps/s | |
test_unbind_pytree | 72.7660μs | 29.2093μs | 34.2357 KOps/s | 34.4291 KOps/s | |
test_unbind_td | 0.4224ms | 38.6777μs | 25.8547 KOps/s | 26.0576 KOps/s | |
test_split_pytree | 74.9910μs | 29.2055μs | 34.2401 KOps/s | 34.4534 KOps/s | |
test_split_td | 0.5562ms | 40.8798μs | 24.4620 KOps/s | 24.4830 KOps/s | |
test_add_pytree | 89.3170μs | 34.3633μs | 29.1008 KOps/s | 28.1000 KOps/s | |
test_add_td | 0.1821ms | 56.6024μs | 17.6671 KOps/s | 16.5113 KOps/s | |
test_distributed | 0.2808ms | 0.1037ms | 9.6466 KOps/s | 9.5532 KOps/s | |
test_tdmodule | 41.4980μs | 15.3913μs | 64.9717 KOps/s | 51.2776 KOps/s | |
test_tdmodule_dispatch | 72.3250μs | 31.7916μs | 31.4548 KOps/s | 26.2583 KOps/s | |
test_tdseq | 44.4940μs | 16.7557μs | 59.6812 KOps/s | 45.9899 KOps/s | |
test_tdseq_dispatch | 0.1146ms | 36.6190μs | 27.3082 KOps/s | 23.7610 KOps/s | |
test_instantiation_functorch | 89.0834ms | 1.4957ms | 668.5699 Ops/s | 745.2147 Ops/s | |
test_instantiation_td | 1.7813ms | 1.0715ms | 933.2562 Ops/s | 959.0876 Ops/s | |
test_exec_functorch | 0.3578ms | 0.1650ms | 6.0594 KOps/s | 6.1246 KOps/s | |
test_exec_functional_call | 0.3149ms | 0.1506ms | 6.6380 KOps/s | 6.6391 KOps/s | |
test_exec_td | 0.2358ms | 0.1458ms | 6.8600 KOps/s | 6.6511 KOps/s | |
test_exec_td_decorator | 0.9370ms | 0.2372ms | 4.2164 KOps/s | 4.5175 KOps/s | |
test_vmap_mlp_speed[True-True] | 0.8070ms | 0.4864ms | 2.0559 KOps/s | 2.0142 KOps/s | |
test_vmap_mlp_speed[True-False] | 1.4318ms | 0.4866ms | 2.0552 KOps/s | 2.0185 KOps/s | |
test_vmap_mlp_speed[False-True] | 0.7875ms | 0.4109ms | 2.4338 KOps/s | 2.5000 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.7459ms | 0.3984ms | 2.5098 KOps/s | 2.5018 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 3.7264ms | 0.5696ms | 1.7556 KOps/s | 1.7716 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.8161ms | 0.5668ms | 1.7642 KOps/s | 1.7664 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 1.0138ms | 0.4734ms | 2.1122 KOps/s | 2.1557 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.7826ms | 0.4753ms | 2.1038 KOps/s | 2.1506 KOps/s | |
test_to_module_speed[True] | 2.4661ms | 1.8730ms | 533.9168 Ops/s | 591.8877 Ops/s | |
test_to_module_speed[False] | 2.7274ms | 1.8168ms | 550.4183 Ops/s | 606.9606 Ops/s | |
test_tc_init | 53.3400μs | 23.9986μs | 41.6690 KOps/s | 14.6495 KOps/s | |
test_tc_init_nested | 0.1145ms | 49.3863μs | 20.2485 KOps/s | 7.1723 KOps/s | |
test_tc_first_layer_tensor | 47.2380μs | 1.3390μs | 746.8419 KOps/s | 167.3070 KOps/s | |
test_tc_first_layer_nontensor | 22.4020μs | 1.3551μs | 737.9591 KOps/s | 167.1346 KOps/s | |
test_tc_second_layer_tensor | 35.6970μs | 1.6018μs | 624.3163 KOps/s | 87.3421 KOps/s | |
test_tc_second_layer_nontensor | 39.9150μs | 2.0329μs | 491.9056 KOps/s | 86.3120 KOps/s | |
test_unbind | 0.1054s | 6.8174ms | 146.6837 Ops/s | 92.3175 Ops/s | |
test_full_like | 17.1318ms | 11.5663ms | 86.4582 Ops/s | 82.8618 Ops/s | |
test_zeros_like | 11.2756ms | 6.0729ms | 164.6654 Ops/s | 169.2976 Ops/s | |
test_ones_like | 12.1817ms | 6.5507ms | 152.6544 Ops/s | 171.4725 Ops/s | |
test_clone | 13.0328ms | 8.6778ms | 115.2361 Ops/s | 125.2599 Ops/s | |
test_squeeze | 0.2091ms | 12.6299μs | 79.1772 KOps/s | 35.9712 KOps/s | |
test_unsqueeze | 0.2059ms | 65.2522μs | 15.3252 KOps/s | 10.0471 KOps/s | |
test_split | 0.3634ms | 0.1080ms | 9.2594 KOps/s | 6.0462 KOps/s | |
test_permute | 0.3979ms | 0.1309ms | 7.6407 KOps/s | 5.5597 KOps/s | |
test_stack | 30.5953ms | 24.2135ms | 41.2994 Ops/s | 40.8224 Ops/s | |
test_cat | 33.6165ms | 24.5304ms | 40.7658 Ops/s | 40.5558 Ops/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 0.5120ms | 16.5439μs | 60.4453 KOps/s | 85.4082 KOps/s | |
test_plain_set_stack_nested | 36.6000μs | 16.6309μs | 60.1289 KOps/s | 85.4786 KOps/s | |
test_plain_set_nested_inplace | 47.2210μs | 17.0188μs | 58.7586 KOps/s | 77.3594 KOps/s | |
test_plain_set_stack_nested_inplace | 37.9210μs | 17.1269μs | 58.3876 KOps/s | 76.5674 KOps/s | |
test_items | 21.4000μs | 4.6021μs | 217.2927 KOps/s | 211.2790 KOps/s | |
test_items_nested | 1.8565ms | 0.9813ms | 1.0190 KOps/s | 2.9533 KOps/s | |
test_items_nested_locked | 1.0520ms | 0.9948ms | 1.0052 KOps/s | 2.9313 KOps/s | |
test_items_nested_leaf | 0.1095ms | 84.1847μs | 11.8786 KOps/s | 12.0914 KOps/s | |
test_items_stack_nested | 1.0593ms | 0.9790ms | 1.0215 KOps/s | 2.9374 KOps/s | |
test_items_stack_nested_leaf | 0.1106ms | 88.0183μs | 11.3613 KOps/s | 11.9440 KOps/s | |
test_items_stack_nested_locked | 0.9979ms | 0.9799ms | 1.0206 KOps/s | 2.9182 KOps/s | |
test_keys | 6.8700μs | 1.8170μs | 550.3711 KOps/s | 214.0092 KOps/s | |
test_keys_nested | 91.8420μs | 65.9254μs | 15.1687 KOps/s | 14.7820 KOps/s | |
test_keys_nested_locked | 89.7630μs | 70.5922μs | 14.1659 KOps/s | 13.8596 KOps/s | |
test_keys_nested_leaf | 81.0520μs | 56.6082μs | 17.6653 KOps/s | 17.2202 KOps/s | |
test_keys_stack_nested | 84.5020μs | 65.4832μs | 15.2711 KOps/s | 15.0067 KOps/s | |
test_keys_stack_nested_leaf | 81.5810μs | 56.1914μs | 17.7963 KOps/s | 17.3739 KOps/s | |
test_keys_stack_nested_locked | 0.1153ms | 70.2529μs | 14.2343 KOps/s | 14.0083 KOps/s | |
test_values | 9.1270μs | 1.7761μs | 563.0336 KOps/s | 546.5834 KOps/s | |
test_values_nested | 41.1710μs | 25.7379μs | 38.8532 KOps/s | 28.4864 KOps/s | |
test_values_nested_locked | 44.3010μs | 27.5811μs | 36.2567 KOps/s | 27.0964 KOps/s | |
test_values_nested_leaf | 88.8310μs | 21.8252μs | 45.8187 KOps/s | 31.9156 KOps/s | |
test_values_stack_nested | 45.4710μs | 26.0624μs | 38.3695 KOps/s | 27.6151 KOps/s | |
test_values_stack_nested_leaf | 45.1710μs | 22.1288μs | 45.1900 KOps/s | 31.1179 KOps/s | |
test_values_stack_nested_locked | 45.6310μs | 28.1055μs | 35.5803 KOps/s | 26.5295 KOps/s | |
test_membership | 0.8217μs | 0.1938μs | 5.1606 MOps/s | 1.3849 MOps/s | |
test_membership_nested | 0.9971ms | 2.2112μs | 452.2437 KOps/s | 389.8364 KOps/s | |
test_membership_nested_leaf | 10.9950μs | 2.1119μs | 473.5068 KOps/s | 389.1113 KOps/s | |
test_membership_stacked_nested | 30.5200μs | 2.1719μs | 460.4211 KOps/s | 383.0130 KOps/s | |
test_membership_stacked_nested_leaf | 17.3700μs | 2.1414μs | 466.9820 KOps/s | 392.0930 KOps/s | |
test_membership_nested_last | 21.4100μs | 2.7985μs | 357.3371 KOps/s | 321.7233 KOps/s | |
test_membership_nested_leaf_last | 20.4410μs | 2.8329μs | 352.9998 KOps/s | 325.2331 KOps/s | |
test_membership_stacked_nested_last | 39.4810μs | 11.0739μs | 90.3023 KOps/s | 101.8798 KOps/s | |
test_membership_stacked_nested_leaf_last | 86.6420μs | 11.0413μs | 90.5690 KOps/s | 101.9397 KOps/s | |
test_nested_getleaf | 26.6410μs | 11.1623μs | 89.5873 KOps/s | 120.3027 KOps/s | |
test_nested_get | 31.9310μs | 10.5475μs | 94.8089 KOps/s | 127.7897 KOps/s | |
test_stacked_getleaf | 27.2800μs | 11.1383μs | 89.7803 KOps/s | 119.0125 KOps/s | |
test_stacked_get | 28.0700μs | 10.5547μs | 94.7447 KOps/s | 127.0954 KOps/s | |
test_nested_getitemleaf | 28.9310μs | 11.3491μs | 88.1129 KOps/s | 118.1388 KOps/s | |
test_nested_getitem | 25.1700μs | 10.7171μs | 93.3085 KOps/s | 125.0526 KOps/s | |
test_stacked_getitemleaf | 27.1710μs | 11.2939μs | 88.5433 KOps/s | 117.0916 KOps/s | |
test_stacked_getitem | 27.5910μs | 10.6897μs | 93.5479 KOps/s | 125.2583 KOps/s | |
test_lock_nested | 84.3050ms | 0.4391ms | 2.2776 KOps/s | 2.4532 KOps/s | |
test_lock_stack_nested | 0.3321ms | 0.3018ms | 3.3131 KOps/s | 3.3581 KOps/s | |
test_unlock_nested | 0.7369ms | 0.3537ms | 2.8276 KOps/s | 2.8505 KOps/s | |
test_unlock_stack_nested | 0.3403ms | 0.3107ms | 3.2180 KOps/s | 3.2669 KOps/s | |
test_flatten_speed | 0.1905ms | 0.1032ms | 9.6919 KOps/s | 9.7813 KOps/s | |
test_unflatten_speed | 0.4454ms | 0.3074ms | 3.2534 KOps/s | 3.4522 KOps/s | |
test_common_ops | 0.8915ms | 0.5758ms | 1.7368 KOps/s | 1.8446 KOps/s | |
test_creation | 31.1800μs | 1.5210μs | 657.4745 KOps/s | 618.8354 KOps/s | |
test_creation_empty | 26.3200μs | 9.7380μs | 102.6906 KOps/s | 150.0572 KOps/s | |
test_creation_nested_1 | 31.5700μs | 12.0299μs | 83.1264 KOps/s | 118.4040 KOps/s | |
test_creation_nested_2 | 40.1300μs | 13.3170μs | 75.0920 KOps/s | 93.3811 KOps/s | |
test_clone | 72.9320μs | 11.4990μs | 86.9642 KOps/s | 85.9959 KOps/s | |
test_getitem[int] | 32.9310μs | 10.5140μs | 95.1113 KOps/s | 91.8083 KOps/s | |
test_getitem[slice_int] | 42.4110μs | 21.3795μs | 46.7739 KOps/s | 49.1679 KOps/s | |
test_getitem[range] | 0.1712ms | 35.7877μs | 27.9426 KOps/s | 21.9333 KOps/s | |
test_getitem[tuple] | 37.7010μs | 17.7869μs | 56.2210 KOps/s | 54.8434 KOps/s | |
test_getitem[list] | 0.1863ms | 31.0698μs | 32.1856 KOps/s | 31.9824 KOps/s | |
test_setitem_dim[int] | 42.9910μs | 26.6080μs | 37.5826 KOps/s | 38.7587 KOps/s | |
test_setitem_dim[slice_int] | 82.9220μs | 48.3387μs | 20.6874 KOps/s | 21.6153 KOps/s | |
test_setitem_dim[range] | 0.1091ms | 62.7405μs | 15.9387 KOps/s | 15.4653 KOps/s | |
test_setitem_dim[tuple] | 60.8110μs | 40.7177μs | 24.5593 KOps/s | 24.9477 KOps/s | |
test_setitem | 76.0320μs | 17.0013μs | 58.8189 KOps/s | 65.7480 KOps/s | |
test_set | 75.3820μs | 16.4635μs | 60.7405 KOps/s | 67.5155 KOps/s | |
test_set_shared | 3.0868ms | 96.7571μs | 10.3352 KOps/s | 9.9131 KOps/s | |
test_update | 88.0220μs | 19.2612μs | 51.9178 KOps/s | 62.0546 KOps/s | |
test_update_nested | 95.1310μs | 23.9470μs | 41.7589 KOps/s | 47.2324 KOps/s | |
test_update__nested | 71.8220μs | 22.5794μs | 44.2881 KOps/s | 45.1775 KOps/s | |
test_set_nested | 94.3710μs | 18.1981μs | 54.9508 KOps/s | 62.7756 KOps/s | |
test_set_nested_new | 81.0620μs | 20.8046μs | 48.0663 KOps/s | 54.4475 KOps/s | |
test_select | 99.1120μs | 34.1060μs | 29.3203 KOps/s | 30.9369 KOps/s | |
test_select_nested | 70.9310μs | 53.4647μs | 18.7039 KOps/s | 18.6314 KOps/s | |
test_exclude_nested | 94.0820μs | 68.3613μs | 14.6282 KOps/s | 9.0119 KOps/s | |
test_empty[True] | 0.3732ms | 0.2703ms | 3.6993 KOps/s | 2.8582 KOps/s | |
test_empty[False] | 3.0421μs | 0.8537μs | 1.1714 MOps/s | 1.1499 MOps/s | |
test_to | 0.1009ms | 75.4189μs | 13.2593 KOps/s | 13.2257 KOps/s | |
test_to_nonblocking | 87.8010μs | 60.8810μs | 16.4255 KOps/s | 16.4796 KOps/s | |
test_unbind_speed | 0.3030ms | 0.2672ms | 3.7423 KOps/s | 3.7598 KOps/s | |
test_unbind_speed_stack0 | 0.3093ms | 0.2648ms | 3.7760 KOps/s | 3.7765 KOps/s | |
test_unbind_speed_stack1 | 0.7110ms | 0.6725ms | 1.4870 KOps/s | 1.2720 KOps/s | |
test_split | 86.8956ms | 1.6337ms | 612.1226 Ops/s | 611.7728 Ops/s | |
test_chunk | 84.8357ms | 1.6244ms | 615.6265 Ops/s | 613.4545 Ops/s | |
test_creation[device0] | 0.1249ms | 58.3538μs | 17.1369 KOps/s | 17.7474 KOps/s | |
test_creation_from_tensor | 0.1267ms | 56.0169μs | 17.8518 KOps/s | 18.7027 KOps/s | |
test_add_one[memmap_tensor0] | 86.3710μs | 7.3831μs | 135.4442 KOps/s | 142.5487 KOps/s | |
test_contiguous[memmap_tensor0] | 11.7110μs | 0.6241μs | 1.6024 MOps/s | 1.4846 MOps/s | |
test_stack[memmap_tensor0] | 36.8010μs | 4.7164μs | 212.0251 KOps/s | 215.6746 KOps/s | |
test_memmaptd_index | 1.1119ms | 0.2882ms | 3.4693 KOps/s | 3.5072 KOps/s | |
test_memmaptd_index_astensor | 0.6444ms | 0.3564ms | 2.8055 KOps/s | 2.7967 KOps/s | |
test_memmaptd_index_op | 1.0255ms | 0.6905ms | 1.4483 KOps/s | 1.6165 KOps/s | |
test_serialize_model | 0.1931s | 0.1122s | 8.9096 Ops/s | 8.6867 Ops/s | |
test_serialize_model_pickle | 1.3506s | 1.2354s | 0.8095 Ops/s | 0.8083 Ops/s | |
test_serialize_weights | 0.1891s | 0.1101s | 9.0787 Ops/s | 8.8262 Ops/s | |
test_serialize_weights_returnearly | 0.2448s | 97.7548ms | 10.2297 Ops/s | 10.6156 Ops/s | |
test_serialize_weights_pickle | 1.3805s | 1.2522s | 0.7986 Ops/s | 0.7989 Ops/s | |
test_reshape_pytree | 59.9510μs | 25.4479μs | 39.2960 KOps/s | 37.3536 KOps/s | |
test_reshape_td | 54.7110μs | 29.3470μs | 34.0750 KOps/s | 32.9111 KOps/s | |
test_view_pytree | 48.8310μs | 25.2335μs | 39.6299 KOps/s | 38.7542 KOps/s | |
test_view_td | 65.9110μs | 33.0422μs | 30.2643 KOps/s | 29.7415 KOps/s | |
test_unbind_pytree | 60.4510μs | 31.6037μs | 31.6419 KOps/s | 31.4949 KOps/s | |
test_unbind_td | 0.4469ms | 39.8041μs | 25.1230 KOps/s | 24.6439 KOps/s | |
test_split_pytree | 59.7030μs | 33.6530μs | 29.7150 KOps/s | 27.4868 KOps/s | |
test_split_td | 0.2492ms | 37.9331μs | 26.3622 KOps/s | 25.9181 KOps/s | |
test_add_pytree | 0.1903ms | 40.9472μs | 24.4217 KOps/s | 26.2077 KOps/s | |
test_add_td | 0.1995ms | 51.5066μs | 19.4150 KOps/s | 22.1697 KOps/s | |
test_distributed | 1.9274ms | 88.7322μs | 11.2699 KOps/s | 11.2727 KOps/s | |
test_tdmodule | 47.3800μs | 13.3927μs | 74.6674 KOps/s | 71.6806 KOps/s | |
test_tdmodule_dispatch | 45.2010μs | 27.7505μs | 36.0354 KOps/s | 36.3467 KOps/s | |
test_tdseq | 30.4200μs | 15.1085μs | 66.1880 KOps/s | 61.9801 KOps/s | |
test_tdseq_dispatch | 47.9810μs | 30.8487μs | 32.4163 KOps/s | 32.7493 KOps/s | |
test_instantiation_functorch | 1.6425ms | 1.5227ms | 656.7258 Ops/s | 660.4085 Ops/s | |
test_instantiation_td | 1.5037ms | 1.0443ms | 957.5773 Ops/s | 867.1305 Ops/s | |
test_exec_functorch | 0.1929ms | 0.1536ms | 6.5093 KOps/s | 6.5609 KOps/s | |
test_exec_functional_call | 0.1900ms | 0.1416ms | 7.0607 KOps/s | 7.0605 KOps/s | |
test_exec_td | 0.1923ms | 0.1412ms | 7.0829 KOps/s | 7.0909 KOps/s | |
test_exec_td_decorator | 0.3264ms | 0.2198ms | 4.5494 KOps/s | 4.7351 KOps/s | |
test_vmap_mlp_speed[True-True] | 1.0505ms | 0.5992ms | 1.6688 KOps/s | 1.6405 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.7614ms | 0.6004ms | 1.6655 KOps/s | 1.6418 KOps/s | |
test_vmap_mlp_speed[False-True] | 0.6050ms | 0.5363ms | 1.8646 KOps/s | 1.7914 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.5923ms | 0.5310ms | 1.8833 KOps/s | 1.7890 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 0.7455ms | 0.6675ms | 1.4982 KOps/s | 1.4564 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.7593ms | 0.6712ms | 1.4898 KOps/s | 1.4480 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.7696ms | 0.5980ms | 1.6721 KOps/s | 1.6301 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.7497ms | 0.6008ms | 1.6644 KOps/s | 1.6657 KOps/s | |
test_vmap_transformer_speed[True-True] | 8.3890ms | 8.0166ms | 124.7418 Ops/s | 123.4568 Ops/s | |
test_vmap_transformer_speed[True-False] | 8.2281ms | 8.0067ms | 124.8957 Ops/s | 124.2517 Ops/s | |
test_vmap_transformer_speed[False-True] | 8.1026ms | 7.9549ms | 125.7095 Ops/s | 125.5127 Ops/s | |
test_vmap_transformer_speed[False-False] | 8.8096ms | 7.9356ms | 126.0146 Ops/s | 126.2385 Ops/s | |
test_vmap_transformer_speed_decorator[True-True] | 19.7115ms | 19.5741ms | 51.0880 Ops/s | 51.5006 Ops/s | |
test_vmap_transformer_speed_decorator[True-False] | 19.6626ms | 19.5175ms | 51.2361 Ops/s | 51.5013 Ops/s | |
test_vmap_transformer_speed_decorator[False-True] | 19.5896ms | 19.4525ms | 51.4073 Ops/s | 51.7440 Ops/s | |
test_vmap_transformer_speed_decorator[False-False] | 19.8407ms | 19.4408ms | 51.4382 Ops/s | 47.4014 Ops/s | |
test_to_module_speed[True] | 1.7045ms | 1.6031ms | 623.7755 Ops/s | 655.4987 Ops/s | |
test_to_module_speed[False] | 1.7089ms | 1.6047ms | 623.1802 Ops/s | 651.4367 Ops/s | |
test_tc_init | 43.2110μs | 26.4352μs | 37.8283 KOps/s | 20.5534 KOps/s | |
test_tc_init_nested | 85.2710μs | 53.1763μs | 18.8054 KOps/s | 9.6094 KOps/s | |
test_tc_first_layer_tensor | 14.3300μs | 0.7476μs | 1.3376 MOps/s | 220.4013 KOps/s | |
test_tc_first_layer_nontensor | 1.6725μs | 0.6578μs | 1.5203 MOps/s | 219.9563 KOps/s | |
test_tc_second_layer_tensor | 1.3870μs | 0.7137μs | 1.4011 MOps/s | 119.0046 KOps/s | |
test_tc_second_layer_nontensor | 4.3502μs | 0.9765μs | 1.0241 MOps/s | 114.8002 KOps/s | |
test_unbind | 95.5513ms | 7.5917ms | 131.7224 Ops/s | 102.7390 Ops/s | |
test_full_like | 12.1830ms | 11.4297ms | 87.4917 Ops/s | 85.4965 Ops/s | |
test_zeros_like | 8.0343ms | 7.8430ms | 127.5026 Ops/s | 142.2836 Ops/s | |
test_ones_like | 8.1541ms | 7.8699ms | 127.0663 Ops/s | 142.1070 Ops/s | |
test_clone | 9.7204ms | 9.4910ms | 105.3631 Ops/s | 104.4180 Ops/s | |
test_squeeze | 51.6110μs | 9.7683μs | 102.3723 KOps/s | 43.1811 KOps/s | |
test_unsqueeze | 0.1917ms | 56.7205μs | 17.6303 KOps/s | 11.1491 KOps/s | |
test_split | 0.2291ms | 91.9514μs | 10.8753 KOps/s | 6.6099 KOps/s | |
test_permute | 0.1834ms | 0.1140ms | 8.7713 KOps/s | 5.9934 KOps/s | |
test_stack | 29.1505ms | 27.6347ms | 36.1863 Ops/s | 35.9637 Ops/s | |
test_cat | 28.0752ms | 27.6008ms | 36.2308 Ops/s | 36.0391 Ops/s |
# Conflicts: # tensordict/tensorclass.py
# Conflicts: # tensordict/tensorclass.py
# Conflicts: # tensordict/tensorclass.py
return tuple(subk for k in key for subk in _unravel_key_to_tuple(k)) | ||
|
||
|
||
def _slice_indices(index: slice, len: int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can go into PyTorch core _dynamo/polyfill.py
and then inline it in dynamo.
if not torch.compiler.is_dynamo_compiling(): | ||
_tensordict = __dict__.get("_tensordict") | ||
else: | ||
_tensordict = self._tensordict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should eventually try to land pytorch/pytorch#118995...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah
For context, since tensordict can be very recursive with a lot of attributes hidden behind some hacky checks within __getattr__
and similar, we sometimes hack our way through getting the __dict__
and directly gathering the variable we're looking for.
I guess that if you use compile this makes less sense so I'm happy to fall back on a regular getattr
if that makes dynamo happy
if default not in (None, dataclasses.MISSING): | ||
kwargs.setdefault(key, default) | ||
else: | ||
# TODO: Decide what to do here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dynamo doesn't want us to iterate over self.__dataclass_fields__.items()
:
File "/Users/vmoens/venv/rl/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1855, in CALL
self.call_function(fn, args, kwargs)
File "/Users/vmoens/venv/rl/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 739, in call_function
self.push(fn.call_function(self, args, kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/vmoens/venv/rl/lib/python3.11/site-packages/torch/_dynamo/variables/misc.py", line 668, in call_function
return self.obj.call_method(tx, self.name, args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/vmoens/venv/rl/lib/python3.11/site-packages/torch/_dynamo/variables/misc.py", line 714, in call_method
return super().call_method(tx, name, args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/vmoens/venv/rl/lib/python3.11/site-packages/torch/_dynamo/variables/base.py", line 320, in call_method
unimplemented(f"call_method {self} {name} {args} {kwargs}")
File "/Users/vmoens/venv/rl/lib/python3.11/site-packages/torch/_dynamo/exc.py", line 216, in unimplemented
raise Unsupported(msg)
torch._dynamo.exc.Unsupported: call_method GetAttrVariable(UserDefinedObjectVariable(MyClass), __dataclass_fields__) items [] {}
tc.__post_init__() | ||
return tc | ||
else: | ||
# TODO: things that did NOT work: **tensordict, dict(tensordict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these (dict(tensordict)
or **tensordict
) are valid syntaxes but only dict(tensordict.items())
worked
@@ -1802,8 +1880,9 @@ def _unbind(self, dim: int): | |||
Resulting tensorclass instances will share the storage of the initial tensorclass instance. | |||
|
|||
""" | |||
# TODO: dynamo doesn't like copy, using dict instead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly copy
doesn't work but dict(another_dict)
does
Can you please open a tracking issue in |
# Conflicts: # tensordict/base.py # tensordict/tensorclass.py
# Conflicts: # tensordict/tensorclass.py # tensordict/utils.py
PT tracking issue: pytorch/pytorch#129668 |
# Conflicts: # tensordict/_td.py # tensordict/_torch_func.py # tensordict/base.py # tensordict/tensorclass.py # tensordict/utils.py
# Conflicts: # tensordict/tensorclass.py
# Conflicts: # tensordict/_lazy.py # tensordict/nn/sequence.py # tensordict/tensorclass.py # tensordict/utils.py
The goal of this PR is to make tensordict compatible with torch.compile without any regression.
We currently reached a good coverage, although tensorclasses and functional calls need some more support.
Enabling torch.compile will achieve two goals:
Speedups (local)
Compatible features:
WIP (compatible through tricks in compile / td or require code adaptation)
__torch_function__
breaks: stack, cat need to be executed throughTensorDict.stack
Not compatible
Related PRs / Issues
super().__getattr__
nn.Module in dynamotorch.stack
and similar are ubiquituous in tensordict. The fix on TD side is straightforward but doesn't provide a great UX (because the error from compile isn't very clear and the solution isn't very elegant).cc @jsuarez5341 @janblumenkamp @btx0424 @soumith @ezyang @matteobettini @albertbou92 @BY571 @Miffyli @teopir @nairbv @luisenp